arXiv:1503.06699v2 [cs.CV] 9 Apr 2015 


Video-Based Action Recognition Using Rate-Invariant 
Analysis of Covariance Trajectories 

Zhengwu Zhang^ Jingyong Su^, Eric Klassen^, Huiling Le and Anuj 

Srivastava^ 

^Department of Statistics, Florida State University 
^Department of Mathematics and Statistics, Texas Tech University 
^Department of Mathematics, Florida State University 
"^School of Mathematics Sciences, University of Nottingham 

April 13, 2015 


Abstract 

Statistical classification of actions in videos is mostly performed by extracting relevant features, 
particularly covariance features, from image frames and studying time series associated with 
temporal evolutions of these features. A natural mathematical representation of activity videos 
is in form of parameterized trajectories on the covariance manifold, i.e. the set of symmetric, 
positive-definite matrices (SPDMs). The variable execution-rates of actions implies variable 
parameterizations of the resulting trajectories, and complicates their classification. Since ac¬ 
tion classes are invariant to execution rates, one requires rate-invariant metrics for comparing 
trajectories. A recent paper represented trajectories using their transported square-root vec¬ 
tor fields (TSRVFs), defined by parallel translating scaled-velocity vectors of trajectories to a 
reference tangent space on the manifold. To avoid arbitrariness of selecting the reference and 
to reduce distortion introduced during this mapping, we develop a purely intrinsic approach 
where SPDM trajectories are represented by redefining their TSRVFs at the starting points of 
the trajectories, and analyzed as elements of a vector bundle on the manifold. Using a natu¬ 
ral Riemannain metric on vector bundles of SPDMs, we compute geodesic paths and geodesic 
distances between trajectories in the quotient space of this vector bundle, with respect to the re¬ 
parameterization group. This makes the resulting comparison of trajectories invariant to their 
re-parameterization. We demonstrate this framework on two applications involving video clas¬ 
sification: visual speech recognition or lip-reading and hand-gesture recognition. In both cases 
we achieve results either comparable to or better than the current literature. 


Keywords: Action recognition, covariance manifold, trajectories on manifolds, vector bun¬ 
dles, rate-invariant classification 
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1 Introduction 


The problem of classification of human actions or activities in video sequences is both impor¬ 
tant and challenging. It has applications in video surveillance, lip reading, pedestrian tracking, 
hand-gesture recognition, manufacturing quality control, human-machine interfaces, and so on. 
Since the size of video data is generally very high, the task of video classification is often 
performed by extracting certain low-dimensional features of interest - geometric, motion, col¬ 
orimetric features, etc - from each frame and then forming temporal sequences of these features 
for full videos. Consequently, analysis of videos get replaced by modeling and classification 
of longitudinal observations in a certain feature space. (Some papers discard temporal struc¬ 
ture by pooling all feature together but that represents a severe loos of information.) Since 
many features are naturally constrained to lie on nonlinear manifolds, the corresponding video 
representations form parameterized trajectories on these manifolds. Examples of these mani¬ 
folds include unit spheres, Grassmann manifolds, tensor manifolds, and the space of probability 
distributions. 

One of the most commonly used and effective feature in image analysis is a covariance 
matrix, as shown via applications in medical imaging lUlE]] and computer vision |l3l|4l|5l|6l|71[8l. 
These matrices are naturally constrained to be symmetric positive-definite matrices (SPDMs) 
and have also played a prominent role as region descriptors in texture classification, object 
detection, object tracking, action recognition and face recognition. Tuzel et al. [[ 3 ]] introduced 
the concept of covariance tracking where they extracted a covariance matrix for each video 
frame and studied the temporal evolution of this matrix in the context of pedestrian tracking in 
videos. Since the set of SPDMs is a nonlinear manifold, denoted by V, a whole video segment 
can be represented as a (parameterized) trajectory on V. In this paper we focus on the problem 
of classification of actions or activities by treating them as parameterized trajectories on V. 
The two specific applications we will study are: visual-speech recognition and hand-gesture 
classification. Fig. [T] illustrates examples of video frames for these two applications. 

One challenge in characterizing activities as trajectories comes from the variability in exe¬ 
cution rates. The execution rate of an activity dictates the parameterization of the corresponding 
trajectory. Even for the same activity performed by the same person, the execution rates can 
potentially differ a lot. Different execution rate implies that the corresponding trajectories go 
through the same sequences of points in V but have different parameterizations. Directly ana¬ 
lyzing such trajectories without alignment, e.g. comparing the difference, calculating point-wise 
mean and covariance, can be misleading (the mean is not representative of individual trajecto¬ 
ries, and the variance is artificially inflated). 

To make these issues precise, we develop some notation first. Let « : [ 0 , 1 ] ^ P be a 
trajectory and let 7 : [0,1] [0,1] be a positive diffeomorphism such that 7(0) = 0 and 

7(1) = 1 . This 7 plays the role of a time-warping function, or a re-parameterization function, 
so that the composition a o 7 is now a time-warped or re-parameterized version of a. In other 
words, the trajectory 007 goes through the same set of points as a but at a different rate (speed). 

1 . Pairwise Registration: Now, let q;i,q;2 : [ 0 , 1 ] —)■ P be two trajectories on P. The 
process of registration of ai and 0:2 is to find a warping 7 such that ai (t) is optimally 
registered to 0:2(7(f)) for all t e [ 0 , 1 ]. In order to ascribe a meaning to optimality, we 
need to develop a precise criterion. 
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Figure 1 : Examples of video frames in visual-speech recognition (first two rows) and hand- 
gesture classification (last two). 


2 . Groupwise or Multiple Registration: This problem can be extended to more than two 
trajectories: let ai, 0:2,..., be n trajectories on V, and we want to find out time warp- 
ings 7i, 72, • • •, 7n such that for all t, the variables {Q;i(7i(f))}”=! are optimally registered. 
A solution for pairwise registration can be extended to the multiple alignment problem as 
follows - for the given trajectories, first define a template trajectory and then align each 
given trajectory to this template in a pairwise fashion. One way of defining this template 
is to use the mean of given trajectories under an appropriately chosen metric. 

Notice that the problem of comparisons of trajectories is different from the problem of curve 
fitting or trajectory estimation from noisy data. Many papers have studied spline-type solutions 
for fitting curves to discrete, noisy data points on manifolds || 9 l[l 0 l[IIl[l 2 l[l 3 ]| but in this paper 
we assume that the trajectories are already available through some other means. 

1.1 Past Work & Their Limitations 

There are very few papers in the literature for analyzing, in the sense of comparing, averaging 
or clustering, trajectories on nonlinear manifolds. Let df, denote the geodesic distance resulting 
from the chosen Riemannian metric on "P. It can be shown that the quantity dp(ai(t), a2 (t))dt 
forms a proper distance on the set the space of all trajectories on V. For example, ifTdll 
uses this metric, combined with the arc-length distance on S^, to cluster hurricane tracks. How¬ 
ever, this metric is not immune to different temporal evolutions of hurricane tracks. Handling 
this variability requires performing some kind of temporal alignment. It is tempting to use the 
following modification of this distance to align two trajectories: 

inf dp{ai{t),a2{^{t)))dt^ , (1) 

but this can lead to degenerate solutions (also known as the pinching problem, described for 
real-valued functions in ifTSll l. Pinching implies that a severely distorted 7 is used to eliminate 
(or minimize) those parts of 0:2 that do not match with ai, which can be done even when 0:2 is 
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mostly different from «i. While this degeneracy can be avoided using a regularization penalty 
on 7, some of the other problems remain, including the fact that the solution is not symmetric. 

A recent solution, presented in lfT6l | 4 ]|, develops the concept of elastic trajectories to deal 
with the parameterization variability. It represents each trajectory by its transported square-root 
vector field (TSRVF) defined as: 



where c is pre-determined reference point on V and ^ denotes a parallel transport of the vec¬ 
tor a{t) from the point a{t) to c along a geodesic path. This way a trajectory can be mapped 
into the tangent space 7^(P) and one can compare/align them using the norm on that vector 
space. More precisely, the quantity inf.^ Il^ai — ^0307 1 | provides not only a criterion for optimal¬ 
ity of 7 but also approximates a proper metric for averaging and other statistical analyses. (The 
exact metric is based on the use of semigroup T, the set of all absolutely continuous, weakly 
increasing functions, rather than T. We refer the reader to ifT^ for details.) This TSRVF rep¬ 
resentation is an extension of the SRVF used for elastic shape analysis of curves in Euclidean 
spaces ma. One limitation of this framework is that the choice of reference point, c, is left 
arbitrary. The results can potentially change with c and make it difficult to interpret the results. 
A related, and bigger issue, is that the transport of tangent vectors a{t) to c, along geodesics, 
can introduce large distortion, especially when the trajectories are far from c on the manifold. 

1.2 Our Approach 

We present a different approach that does not require a choice of c. Here the trajectories are 
represented by their transported vector fields but without a need to transport them to a global 
reference point. For each trajectory a,, the reference point is chosen to be its starting point 
«j( 0 ), and the transport is performed along the trajectory itself. In other words, for each t, the 
velocity vector is transported along a to the tangent space of the starting point «j( 0 ). This 
idea has been used previously in ifTTIl and others for mapping trajectories into vector spaces, 
and results in a relatively stable curve, with smaller distortion than the TSRVFs of ifT^ . We 
then develop a metric-based framework for comparing, averaging, and modeling such curves in 
a manner that is invariant to their re-parameterizations. Consequently, this framework provides 
a natural solution for removal of rate, or phase, variability from trajectory data. 

Another issue is the choice of Riemannian metric on V. Although a larger number of Rie- 
mannian structures and metrics have been used for V in past papers and books ifT^fTOll . they 
do not provide all the mathematical tools we will need for ensuing statistical analysis. Con¬ 
sequently, we use a different Riemannian structure than those previous papers, a structure that 
allows relevant mathematical tools for applying the proposed framework. 

The rest of this paper paper is organized as follows. In Section]^ we introduce a framework 
of aligning, averaging and comparing of trajectories on a general manifold M. Since we mainly 
focus on V, in Section we introduce the details of the Riemannian structure on V used to 
implement the comparison framework described in Section]^ In Section]^ we demonstrate the 
proposed work with real-world action recognition data involving two applications: lip-reading 
and hand-gesture recognition. 
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2 Riemannian Structure on V 


Here we will discuss the geometry of V and impose a Riemannian structure that facilitates 
our analysis of trajectories on V. Most of the background material is derived in Appendix [a| 
with only the final expressions noted here. For a beginning reader in differential geometry, we 
strongly recommend reading Appendix [A| first. 

We start by choosing an appropriate Riemannian metric on V. Then, from the resulting 
structure, we derive expressions for the following: (1) geodesic paths between arbitrary points 
on P; (2) parallel transport of tangent vectors along geodesic paths; ( 3 ) exponential map; ( 4 ) 
inverse exponential map; and ( 5 ) Riemannian curvature tensor. Several past papers have studied 
the space of SPDMs as a nonlinear manifold and imposed a metric structure on that manifold 
While they mostly focus on defining distances, a few of them originate from 
a Riemannian structure with expressions for geodesics and exponential maps. However, they 
do not provide expressions for all desired items (parallel transport and Riemannian curvature 
tensor). In this section, we utilize a particular Riemannian structure on P, which is similar but 
not identical to the one in |l 2 ]], and is particularly convenient for our purposes. This Riemannian 
structure has been used previously for other applications such as spline-fitting in ifT^ . 

Let P be the space of n x n SPDMs, and let V be its subset of matrices with determinant one. 
The tangent spaces of P and V at I, where / is the identity matrix, are TpP) = = A} 

and Ti{V) = p{n) = {A\A^ = A and tr(A) = 0}. The exponential map at / can be shown to be 
the standard matrix exponential: for any P e 7 ^ (P G P), there is A G TiiV) {A G TiiP)) such 
that P = (P = e^), where denotes the matrix (Lie) exponential; this relationship is one- 
to-one. Our approach is first to identify the space P with the quotient space SL{n)/SO{n) and 
borrow the Riemannian structure from the latter directly. Then, we straightforwardly extend 
the Riemannian structure on V to P. The Riemannian geometries of SL{n) and its quotient 
space SL{n)/SO{n) are discussed in Appendix [ a} As described there, the Riemannian metric 
at any point G is defined by pulling back the tangent vectors under G~^ to /, and then using the 
trace metric (see Eqn. [^. This definition leads to expressions for exponential map, its inverse, 
parallel transport of tangent vectors, and the Riemannian curvature tensor on SL{n). It also 
induces a Riemannian structure on the quotient space SL{n)/SO{n) in a natural way because 
it is invariant to the action of SO{n) on SL{n). 

2.1 Riemannian Structure on V 

To make the connection with P, we state a useful result termed the polar decomposition of 
square matrices. Recall that for any square matrix G G SL{n), one can decompose it uniquely 
as G = PS where P is a SPDM with determinant one and S G SO{n). We note in passing that 
this fact makes P a section of SL{n) under the action of SO{n). (If a group acts on a manifold, 
then a section of that action is defined to be a subset of the manifold that intersects each orbit 
of the action in at most one point. It is easy to see that P satisfies that condition for the action 
of SO{n) on SL{n). For any SO{n) orbit [G], P intersects that orbit in only one point given 
by P such that G = PS.) We also note that this section is not an orthogonal section since it is 
not perpendicular to the orbit, i.e., the tangent space Tp{V) is not perpendicular to the tangent 
space 7 p([G]). We will identify P with the quotient space SL{n)/SO{n) via a map tt defined 
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as: 


TT : SL{n)/SO{n) P,7r([G']) = , 

for any G G [G]. One can check that this map is well defined and is a diffeomorphism. This 
square-root is the symmetric, positive-definite square-root of a symmetric matrix. One can 
verify that 7r([G]) lies in V by letting G = PS (polar decomposition), and then 7r([G]) = 

\/GG* = VPSS^P = P. The inverse map of vr is given by: 7r“^(P) = [P] = {PS'| 5 ' G 
SO{n)} G SL{n)/SO{n). This establishes a one-to-one correspondence between the quotient 
space SL{n)/SO{n) and V. We will use the map vr to push forward the chosen Riemannian 
metric from the quotient space SL{n)/SO{n) ioV. 

Geodesic between two points on V: With that induced Riemannian metric, we can derive the 
geodesic path and the geodesic distance between any Pi and P2 in P. The idea is to pullback 
these points into the quotient space using TT compute the geodesic there and then map the 
result back to P using tt. As mentioned earlier 7r“^(Pi) = [Pi] and 7r“^(P2) = [P2]. The 
expression for geodesic between these points in the quotient space is given in Appendix [A] We 
compute Ai2 G p(n) such that = P“^P2S'i2 for some 5 'i2 G SO{n). (hetPu = Pf^P2*S'i2- 
Note that P12 G P, and the rotation matrix 5'i2 brings Pf ^P2 to P.) The corresponding geodesic 
in the quotient space (SL{n)/SO{n)) is given by f 1-^ [Pie*^^^] and, therefore, the desired 
geodesic in P is 

t ^ 7 r([Pie*^i 2 ]) = v^Pie2i^i2Pi . 

The corresponding geodesic distance is d{Pi, P2) = d{I, P12) — 11^1211- 

Parallel transport of tangent vectors along geodesic paths on P: To determine the parallel 
transport along the geodesic path t i-?- 7 T{[Pie^^^^]), we recall from Appendix [ a| that the two 
orthogonal subspaces of 5[(n): so{n) and p(n), which we call the vertical and the horizon¬ 
tal tangent subspaces, respectively. We identify the tangent space Tp{V) with the horizontal 
subspace in Tp{SL{n)), so that Tp{V) = {PA\A G p{n)}. 

Now let X G 7 pi (P) be a tangent vector that needs to be translated along a geodesic path 
from Pi to P2, given by f 1-^ Similar to the case of quotient space in Appendix [ a| 

let B G p(n) such that X is identified with P^B. In the quotient space, the parallel transport 
of X along the geodesic is a vector field t i-> Pie^^^^B. In P, the geodesic is obtained by the 
forward map tt, f 7r([Pie*^i2]) Therefore, the parallel transport vector field in P along the 
geodesic is the image of PiP^^^B under dir: 

t ^ d 7 r{PiB^^^B) = 7 r{[PiB^^^])T{t)BT{ty, 

where T(f) G SO(n) andw{[PiP^^^])T{t) = P^P^^S 

Riemannian curvature tensor on P: Let X, Y and Z be three vectors on Tp{V), then the 
Riemannian curvature tensor on P with these arguments can be calculated in the follows. Let 
A, B and G be elements in p(n) such that X = PA, Y = PB and Z = PC, the tensor is given 
by: 

R{X,Y)iZ) = -[[X,F],Z] = -P[[A,B],C]. 

We summarize these mathematical tools for trajectories analysis on the manifold P: 
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1 . Exponential map: Give a point P eV and a tangent vector V G TpiV), the exponential 
map is given as: expp(l/) = \/ 

2 . Geodesic distance: For any two points Pi, P2 G P, the geodesic distance between them 

in is given by: d-p{P\,P2) = d-p{I,Pi2) = ||^i2|| where G V and P12 = 

PPWP- 

3 . Inverse exponential map: For any Pi,P2 G V, t he inverse e xponential map can be 
calculated using the formula: expp^'(P 2 ) = Piiog(VPr'PlPr'). 

4. Parallel transport: Given Pi,P2 G V and a tangent vector V G Tp^{V), the tangent 
vector at P2 which is the parallel transport of V along the shortest geodesic fro m Pi to P 2 
is: P2Tf^BTi2, where B = Pf V, ri2 = Pr2'Pr'P2 and P12 = ^JP{^PiP{\ 

5 . Riemannian curvature tensor: For any point P G P and tangent vectors X, Y and 
Z G Tp(V), the Riemannian curvature tensor R{X,Y){Z) is given as: R{X,Y){Z) = 
-P[[A, B],C], where A = p-^X, B = p-^F, C = P-^Z and [A, B]=AB- BA. 

2.2 Extension of Riemannian Structure to V 

We now extend the Riemannian structure on V to V. Since for any P G P we have det(P) > 0 , 
we can express P = {P,^ log(det(P))) with P = G P. Thus, P is identified with the 

product space of P x M. Moreover, for any smooth function ip on P, the Riemannian metric 
9(p,x) = ip‘^dx^ + gp, where g{p,x) and gp denote the Riemannian metrics on P and P respec¬ 
tively, gives P the structure of a “warped” Riemannian product. In this paper we set ip = ^/n. 

Geodesic between points on P: Using the above Riemannian metric (g(p,x)) on P, the resulting 
squared distance on P between / and P is: 

dp{I, Pf = dp{I, Pf + -(log(det(P)))2. 

n 


For two points Pi and P2, let P12 = Pi ^P2*S'i2 for some S'12 G SO{n), and P12 G P, we have 
det(i\2) = det(P2)/det(Pi). Therefore, the resulting squared geodesic distance between two 
SPDMs Pi and A is df,{Pu A)" = P^f 

= dpiI,Pi2f + -(log(det(A)) - log(det(A)))'. 

n 

Next, let (p = (01, (P2) denote the geodesic in P, where 0 i is a geodesic on P and 02 is a 
geodesic in E. The geodesic 0 i from Pi to P2 in P is given by 0 i(f) = '/PP^^^Bpp from 
earlier derivation, and the geodesic 02, a line segment from ^ log(det(i\)) to ^ log(det(P2)). is 
02= (1 iog(det(i\)) -|-fd log(det(P2))- Therefore, with simple calculation, the matri¬ 
ces in P corresponding to the geodesic path 0 are 0 (f) = det(i\)^/” ( dlt(Fi) ) g 

P. 
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Parallel transport of tangent vectors along geodesic paths on P: If 1 / is a tangent vector to 

V at P, where P is identified with P = (P, x) and x = ^ log(det(P)), we can express V as 

V = (V, v) with V being a tangent vector of V at P. The parallel transport of V is the parallel 
transport of each of its two components in the corresponding space, and the transport of v is 
itself. The other part, i.e. the parallel translation of F in P has been dealt earlier. 

Riemannian curvature tensor on P: By identifying the tangent vector V = (V, t>), the non¬ 
zero curvature tensors are just the part in P. Therefore, the curvature tensor for X = {X, x), 

V = {Y, y) and Z = (Z, z) on Tp(V) is the same as R{X, Y)Z. 

Again, we summarize these mathematical tools on P: 

1 . Exponential map: Give P e P and a tangent vector V G Tp{V). We denote V = (V, v), 
where V G Tp{V), P = P/ det(P)^/” and v = ^ log(det(P)). The exponential map 
expp(l/) is given as expp(l^), where expp(l/) = s/Pe^^~^^P. 

2. Geodesic distance: For any A, A € P, the squared geodesic distance betwe en them is : 

dpiPuP2? = dp(/,Pi2)2+Alog(det(P2))-log(det(A)))^ where Pi2 = 

3 . Inverse exponential map: For any A, A G P, the inverse exponential map exp^^ (A) = 
1 / = (y, v), where V = Pi log(y/A“'AA“') and t; = ^ log(det(P2)) - ^ log(det(Pi)). 

4 . Parallel transport: For any A, A G P and a tangent vector V = {V,v) G Tp^{V), 
the parallel transport of V along the geodesic from A to A is: {P2TI2BT12, v), where 
B = Pf V,Ti 2 = Pr2'Pr'P2 and P12 = y/A“'AA“'- 

5 . Riemannian curvature tensor: For tangent vectors A = (X,x),Y = {Y,y) and Z = 
(Z, z) G Tp{V), the Riemannian curvature tensor P(X, Y)(Z) is the same as R(X, Y)(Z). 

2.3 Differences From Past Riemannian Structures 

Pennec et al. |l 2 ]] and others |l 2 ll| 22 l| 23 ]| and even others before them have utilized a Riemannian 
structure on P that is commonly known as the Riemanian metric in the literature. Here we 
clarify the difference between our Riemannian structure and the structure used in these past 
papers. 

We have used a natural bijection vr : SL{n)/SO{n) -p V given by 7r([G']) = V GG^ to 
inherit the Riemannian structure from the quotient space to P. Its inverse is given by 7r“^ (P) = 
[P]. There is another bijection between these same two sets, defined by ttbUG]) — GG^, and 
with the inverse (P) = ['/P] ■ (Note that an element of P can be raised to any positive power 
in a well-defined way by diagonalizing.) If we use ttb to transfer the Riemannian structure from 
the quotient space SL{n) / SO{n) onto P, we will reach the one used in earlier papers. In other 
words, the structure on the quotient space is the same for both cases, only the inherited structure 
on P is different due to the different bijections. Note that for all P G P, 7rB(7r“^(P)) = P^. It 
follows that the map from P —)■ P defined by P i-> P^ forms an isometry with our metric on 
one side and the older metric on the other. 

The Riemannian metric on SL{n) (trace metric on pulled-back matrices at identity) is in¬ 
variant under right translation by elements of SO{n) and under left translation by elements of 


8 







SL{n). It follows that the induced metric on the quotient space is invariant under the natural 
left action of SL{n) on SL{n)/SO{n). (This action is given by [g, [G]) i-?- [pG]).) It is a well- 
known fact that, up to a fixed scalar multiple, this is the only metric on SL{n)/SO{n) that is 
invariant under the action by SL{n). This aspect is same for both the cases. If we push forward 
the action of SL{n) on the quotient space via tt, the resulting action of SL{n) on V is given by 
(G, P) 1-^ V GP‘^G^ , but if we push forward the action via tt^, the resulting action of SL{n) 
on V is given by (G, P) i-?- GPG^. 

Given a simple relationship between the two Riemannian structures, one may conjecture 
that the relevant formulae (parallel transport, curvature tensor, etc) can be mapped from one 
setup to another. While this is true in principle, the practice is much harder since it involves the 
expression for the differential of this map which is somewhat complicated. 


3 Analysis of Trajectories on V 

Now that we have expressions for calculating relevant geometric quantities on V, we return to 
the problem of analyzing trajectories on V. In the following we derive the framework for a 
general Riemannian manifold M, keeping in mind that M = V in our applications. 


3.1 Representation of Trajectories 

Let a denote a smooth trajectory on a Riemannian manifold M and P denote the set of all such 
trajectories: P = {a : [0, 1] ^ M\a is smooth}. Define T to be the set of all orientation 
preserving diffeomorphisms of [0,1]: T = {7 : [0,1] ^ [0,1] [7(0) = 0, 7(1) = 1, 7 is a 
diffeomorphism). T forms a group under the composition operation. If a is a trajectory on 
M, then a o 7 is a trajectory that follows the same sequence of points as a but at the evolution 
rate governed by 7. More technically, the group T acts on x T P, according to 

{a, 7) = a o 7. 

We introduce a new representation of trajectories that will be used to compare and register 
them. We assume that for any two points «(ri), «(r 2 ) G M, ri ^ r 2 , we have a mechanism 
for parallel transporting any vector v G Ta(Ti){M) along a from q;(ti) to «(r 2 ), denoted by 

('^)a(Ti)^a(T2)- 


Definition 1 Let a : [0,1] —>■ M denote a smooth trajectory starting with p = q;(0). Given 
a trajectory a, and the velocity vector field a, define its transported square-root vector field 
(TSRVF) to be a scaled parallel-transport of the vector field along a to the starting point p 

denotes the norm 


according to: for each t G [0,1], ^(t) = (—?=^)a(T)->p G Tp{M), where 

V l“L)l 

that is defined through the Riemannian metric on M. 


Note the difference in this definition from the one in ifT^ where the parallel transport was along 
geodesics to a reference point c. Here the parallel transport is along a and to the starting point 
p. The concept of parallel transportation of velocity vector along trajectories has also been used 
previously in IfTTlI and others. This reduces distortion in representation relative to the parallel 
transport of ifT^ to a faraway reference point. 
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This TSRVF representation maps a trajectory « on M to a curve q in Tp{M). What is the 
range space of all such mapping? For any point p e M, denote the set of all square-integrable 
curves in Tp{M) as Cp = L^([ 0 , 1 ], 7 ^(M)). The space of interest, then, becomes an infinite¬ 
dimensional vector bundle C = which is the indexed union of Cp for every p e M. 

Note that the TSRVF representation is bijective: any trajectory a is uniquely represented by a 
pair {p, q{-)) G C, where p E M is the starting point, g G Cp is its TSRVF. We can reconstruct 
the trajectory from (p, q) using the covariant integral defined later (in Algorithm|^. 

3.2 Riemannian Structure on C 

In order to compare trajectories, we will compare their corresponding representations in C and 
that requires a Riemannian structure on the latter space. Let ai,a2 be two trajectories on M, 
with starting points pi and p2, respectively, and let the corresponding TSRVFs be qi and q2. 
Now tti, «2 are represented as two points in the vector bundle {pi,qi), (P2,92) G C over M. 
This representation space is an infinite-dimensional bundle, whose fiber over each point p in M 
is Cp. 

We impose the following Riemannian structure on C. For an element in C, where 

X E M,v E Cj;, we naturally identify the tangent space at {x, v) to be: T(x,v)i^) — %{M)E)Cx. 
To see this, suppose we have a curve in C given by (a:(s), r»(s, r)), s, r G [ 0 , 1 ]. The velocity 
vector to this curve at s = 0 is given by (xs( 0 ), Va;^r’( 0 , •)) G Tx{M) © Ca,, where Xg denotes 
dx/ds, and denotes covariant differentiation of tangent vectors. The Riemannian inner 
product on C is defined in an obvious way: If (ui, tci(-)) and (u2, w'2(-)) t)oth elements of 

'T{x,v)iC) = Tx{M) © C^, define 



( 2 ) 


where the inner products on the right denote the original Riemannian metric in Tx{M). 

Next, for given two points {pi,qi) and (p2, ^2) on C, we are interested in finding the geodesic 
path connecting them. Let {x{s),v{s)) ,sE [ 0 , 1 ] be a path with (a:( 0 ),s( 0 )) = (pi,gi) and 
(x(l), s(l)) = {p2,q2)- We have the following characterization of geodesics on C. 

Theorem 1 A parameterized path [ 0 , 1 ] —?• C given by s ^ {x{s),v{s,t)) on C (where the 
variable r corresponds to the parametrization in Cx), is a geodesic on C if and only if: 


^Xs^s + Jo V, \^xs^)(^s)dT 

^Xs Xs'^){Si' t) 


0 for every s, 

0 for every s,T. 


( 3 ) 


Here denotes the Riemannian curvature tensor, Xg denotes dx/ds, and Vx^ denotes 

the covariant differentiation of tangent vectors on tangent space Tx(s){M). 

Proof: We will prove this theorem in two steps. 

( 1 ) First, we consider with a simpler case where the space of interest is the tangent bundle TM 
of the Riemannian manifold M. An element of TM is denoted by (x, v), where x E M and 
V E Tx{M). It is natural to identify T(x,v){TM) = Tx{M) © Tx{M). The Riemannian inner 


10 


product on TM is defined in the obvious way: If {ui,wi) and {u2,W2) are both elements of 
T(x,v){TM), define 

((mi, tWl), (U2, W2)) = Ui • U2 + Wi ■ W2 


and, again, the inner products on the right denote the original Riemannian metric on Tx{M). 
Suppose we have a path in / TM given by s i-?- (x(s), u(s)). We define the energy of this 
path by 

E = f (Xs • Xs + VxsV ■ Vx,v)ds. 


Jo 

The integrand is the inner product of the velocity vector of the path with itself. It is a stan¬ 
dard result that a geodesic on TM can be characterized as a path that is a critical point of this 
energy function on the set of all paths between two fixed points in TM. To derive local equa¬ 
tions for this geodesic, we now assume we have a parameterized family of paths denoted by 
(x(s, t), x(s, t)), where s is the parameter of each individual path in the family (as above) and 
the variable t tells us which path in the family we are in. Assume 0 < s < 1 and t takes values 
on {— 5 , 5 ) for some small 5 . We want all the paths in this family to start and end at the same 
points of TM, so assume that (x( 0 , f), x( 0 , t)) and (x(l, t), x(l, t)) are constant functions of t. 
The energy of the path with index t is given by: 


E(t) = 


^Xs'^ ' ^Xs^')ds . 


Jo 

To simplify notation in what follows, we will write for and V* for Vxf To establish 
conditions for (x, v) to be critical, we take the derivative of E{t) with respect to f at f = 0 : 


£’'(0) = 2 f [{VtXs ■ Xs) + (Vti^sv) ■ Vsv)]ds . 

Jo 

We will use two elementary facts: (a) Vtixg) = Vsixt) and (b) R{xt,Xs){v) = VtCVsv) — 
’Vsi'^tv), without presenting their proofs. Plugging these facts into the above calculation then 
makes E'{ 0 ) equal to: 

2 f [VgXt ■ Xs + R{xt, Xs){v) ■ VsV + Vs(Vtx) • Vsv]ds 
Jo 

=2 / [(-VsX^ • xt) + R{xt, Xs){v) ■ VsV + (-Vs(Vsx) • Vtv)]ds. 

Jo 

The second equality comes from using integration by parts on the first and third term, taking 
into account the fact that Xt and Vtx vanish at s = 0,1, (since all the paths begin and end at 
the same point). Now, using the standard identities R{X, Y){Z) • W = R{Z, W){X) • Y and 
R{X, Y)(Z) • W = -R(X, Y){W) ■ Z, we obtain: £'( 0 ) equals 


2 / [{-XsXs-Xt)Y{-R{v,Xsv){xs)-Xt) 

Jo 

+(-Vs(Vsx) • Xtv)]ds 

=-2 / [(VsXs£(x, Vsx)(xs)) • Xt-I-(Vs(Vsx) • Vtx)]ds 
Jo 

=-2/ {XgXs + R{v,Xsv){xs)) ■ Xtds - 2 I Xs{Vsv) ■ XfV ds . 
Jo Jo 


11 



Now, (x{s),v{s)) is critical for E if and only if £’'( 0 ) = 0 for every possible variation Xt of 
X and Vf(r;) of v, which is clearly true if and only if 

Vs^s + R{v, 'Vsi^){xs) = 0 and = 0. 

Thus we have derived the geodesic equations for TM. 

( 2 ) Now we consider the case of the infinite dimensional vector bundle C —?• M whose fiber 
over X e M is I = [ 0 , 1 ]. A point in C is denoted by {x,v{t)), where the 

variable r corresponds to the /-parameter in L^(/, 7 ^(M)). The tangent space to C at {x, v{t)) 
is Tx{M) 0 L^(/, 7 ^(M)). Suppose {ui,wi{t)) and {u2,W2{t)) are elements of this tangent 
space and we use the Riemannian metric: 


((Mi, U)i(t)), («2, W2{t))) =Ui-U2 + 


Wi{t) • W 2 {t) dr. 


Now we want to work out the local equations for geodesics in C. A path in C is denoted by 
{x{s),v{s, r)). The energy calculation is basically the same as above but surround everything 
with integration with respect to r. So, it starts out with 


E= r(. g * Xg I \/ gX * \/ gV dx I ds 
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'0 


1 rl 



{xg ■ Xg 0 VgV • Vgv) dsdr. 


0 JO 


(Of course Xg ■ Xg does not involve the parameter r, but surrounding it with fg... dr does not 
change its value!) 

In order to do the variational calculation, we now consider a parametrized family of such 
paths, denoted by (a:(s, t, r)) where we assume that x(0, t) and x(l, t) are constant func¬ 
tions of t, and for each r, i>(0, t, r) and r) are constant functions of t, since we want every 

path in our family to start and end at the same points of C. 

Then, following through the computation exactly as in earlier case, we obtain 


£'(0)=-2^ (VgXg + R{v,V, 


v){xs) dr] ■ Xt ds 


—2 / f VsCVsv) ■ Vtv drds. 
Jo Jo 


In order for our path (x{s),v{s, r)) to be critical for E, E'( 0 ) must vanish for every variation 
Xt(s) of x(s) and Vt(v(s, r)) of v(s, r), which is clearly true if and only if 


0 / R(v, Vsv)(xs) dr = 0, for every s 

Jo 

Vs(Vsf) = 0, for every s and every r . 


Q.E.D 
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Figure 2 : Examples of geodesic between two trajectories on 


The geodesic path (x{s),v{s)) can be intuitively understood as follows: ( 1 ) x{s) is a base¬ 
line curve on M connecting pi and p2, and the covariant differentiation of Xg at the tangent space 
of %{s) (M) equals the negative integral of the Riemannian curvature tensor R{v{s, r), t)){xs) 

with respect to r . In other words, values of v at each r equally determine the geodesic accel¬ 
eration of x{s) in the first equation. ( 2 ) The second equation leads to a fact that v is covariant 
linear, i.e. v{s,t) = a(s,T) -I- s6(s,r) and = Vx^b = 0 for every s and r. For a 

geodesic path connecting (pi, qi) and (p2, Q2), h is natural to let o(s, r) = qi{T)x(o)^x(s) and 
b{s, r) = w{t)x{o)^x{s), where gi(r)^(o)^x(s) and w(r)^(o)^x(s) represent the parallel transport 
of gi(r) and w(r) along a:( 0 ) to x{s), and w is the difference between the TSRVFs q2 and qi 
in Tx{o){M), defined as (fZ2)a:(i)^-a;(o) “ la Fig- we illustrate the geodeisc path between 
two trajectories on a simpler manifold M = In each case, the yellow solid line denotes 
the baseline x(s) and the intermediate lines are the covariant integrals (in Algorithm]^ of v(s) 
with starting point a:(s). As comparison, the dash yellow line shows the geodesic between the 
starting points pi and p2 on 

Theorem [T] is only a characterization of geodesics but does not provide explicit expressions 
for them. In the following section, we develop a numerical solution for constructing geodesics 
on C. 


3.3 Numerical Computation of Geodesic in C 

There are two main approaches in numerical construction of geodesic paths on manifolds. The 
first approach, called path-straightening, initializes with an arbitrary path between the given 
two points on the manifold and then iteratively “straightens” it until a geodesic is reached. The 
second approach, called the shooting method, tries to “shoot” a geodesic from the first point, 
iteratively adjusting the shooting direction, so that the resulting geodesic passes through the 
second point. In this paper, we use the shooting method to obtain the geodesic paths on C. 

In order to implement the shooting method, we need the exponential map on C. Given a 
point {jp, q) E C and a tangent vector (u, w) G 7 (p,q)(C), the exponential map {s{u, w)) 

for s G [ 0 , 1 ] gives a geodesic path (a:(s), t»(s)) on C. Equation]^ helps us with this construc¬ 
tion as follows. The two equations essentially provide expressions for second-order covariant 
derivatives of x and v components of the path. Therefore, using numerical techniques, we can 
perform covariant integration of these quantities to recover the path itself. 
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Algorithm 1 Numerical exponential map on C 

Let the initial point be (x( 0 ), t’(O)) G C and the tangent vector be {u, w) G T(x(o) ..( 0 ))(C). 

'We have a:s( 0 ) = u, Va;jt'(s)|s=o = w. Fix an e = 1 /n, the exponential map {x{ie),v{ie)) = 
^W(x( 0 ),v( 0 )) w))(i = !,■■■ ,n) is given as: 

1 . Set x{€) = exp3.(Q)(eXs(0)), where a:s( 0 ) = u, and v{e) = + ew^^), where and 

are parallel transports ofv{ 0 ) and w along path xfrom x( 0 ) to x{e), respectively. 

2 . For each i = l, 2 ,...,n-l, calculate 

Xs{ie) = [xs{{i - l)e) + , 

where Vx.Xsiii - l)e) = -R{v{{i - - l)e)) - l)e)) is given by 

the first equation in Theorem^ It is easy to show that R{v{{i — l)e), Va;^t'((z — l)e)) = 

R (t;ll + e{i - l)wll, = R (vH, wH), where = v{ 0 )x{o)^x({i-i)e), andw^^ = Wx(Q)^x({i-i)e)- 

3 . Obtain x((i + l)e) = {eXs{ie)), and v{{i + l)e) = + (i + where 

^(O)x(O)—>x((i+l)e)> and Wx(Q'^^x{(i+l)e)- 

Once we have a numerical procedure for the exponential map, we can establish the shooting 
method for finding geodesics. Let (pi, qi) be the starting point and {p2, 52) be the target point, 
the shooting method iteratively updates the tangent or shooting vector {u, w) on T{p^,q^){G) such 
that exp(pj gj) (( m , w)) = {p2, ^2)- Then, the geodesic between {pi,qi) and (p2, Q2) is given by 
(x(s), ?;(s)) = exp(p^ w)), s G [ 0 , 1 ]. The key step here is to use the current discrepancy 

between the point reached, exp^^^ {{u,w)), and the target, (^2,^2). to update the shooting 
vector («, w), at each iteration. There are several possibilities for performing the updates and 
we discuss one here. Since we have two components to update, u and w, we will update them 
separately: ( 1 ) Fix w and update u. For the u component, the increment can come from parallel 
translation of the vector exp^^(p2) (the difference between the reached point p and the target 
point P2) from p to pi, where p is the first component of reached point exp^^^ w)). ( 2 ) 

Fix u and update w. For the w component, we can take the difference between q2 and the 
second component of the point reached (denoted as q) as the increment. This is done by parallel 
translating q to Tp^iM) (the same space as ^2) and calculate the difference, and then parallel 
translate the difference to Tp^ (M) to update w. 

Algorithm 2 Shooting algorithm for calculating geodesic on C 

Given {pi,q2), {P2, ^2) G C, select one point, say (pi, qi), as the starting point and the other, 

(p2, q2), CIS the target point, the shooting algorithm for calculating the geodesic from {p\,q\) to 
(P2, 52 ) is: 

1 . Initialize the shooting direction: find the tangent vector u at pi such that the exponential 

map expp^ {u) = p2 on the manifold M. Parallel transport q2 to the tangent space ofpi 
along the shortest geodesic between pi and p2, denoted as g|. Initialize tc = — pi. 

Now we have a pair {u, w) G 7 (pi, 5 i)(C). 

2 . Construct a geodesic starting from (pi,5i) in the direction {u,w) using the numerical 
exponential map in Algorithm^ Let us denote this geodesic path as {x{s),v{s)), where 
s is the time parameter for the geodesic flow. 
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3 . If{x{l),v{l)) = {p2, q2), we are done. If not, measure the discrepancy between (a:(l), t>(l)) 
and (p2, ^2) using a simple measure, e.g. Lf distance. 

4 . Iteratively, update the shooting direction {u, w) to reduce the discrepancy to zero. This 
update can be done using a two-stage approach: (I) fix u and update w until converge; 
(2) fix w and update u until converge. 

Recall that trajectories on M and their representations in C are bijective. For each pair 
(p, q) e C, one can reconstruct the corresponding trajectory a using covariant integration. A 
numerical implementation of this procedure is summarized in Algorithm]^ 

Algorithm 3 Covariant integral of q along a 

Given a TSRVF q sampled at T times {t 5 \t = 0 , 1 ,..., T — 1 }, 5 = 1 /T, and the starting 
point p: 

1. Set «( 0 ) = p, and compute a{ 5 ) = exp^(o)('^ 5 ( 0 )k( 0 )l)» where exp denotes the exponen¬ 
tial map on M. 

2 . Fort= l,2,...,r- 1 

(a) Parallel transport q{td) to a{t 5 ) along the current trajectory from q;( 0 ) to a{tS), 
and call it q^^{tS). 

(b) Compute a{{t + 1 )( 5 ) = exp„(j5)(5gll(t5)|gll(t5)|). 

Algorithm allows us to calculate the geodesic between two points in C. So, for each 
point along the geodesic {x{s),v{s)) in C, one can easily reconstruct the trajectory on M using 
Algorithm]^ Here, one sets x(s) as the starting point and v{s) as the TSRVF of the trajectory. 
Fig. 1 ^ shows one example of calculating geodesic using the numerical method in Algorithm 
1 ^ where M = V, the set of 3 x 3 SPDMs. In this plot, each SPDM matrix is visualized 
by an ellipsoid and a trajectory on P by a sequence of ellipsoids. The left panel shows two 
original trajectories «i and «2 (their representations are {pi,qi) and (p2, 42)), the end point of 
the trajectory shot (exp^^^ w)), and baseline path x(s). In this case, we selected {pi,qi) 

as the starting point and computed the shooting direction {u,w) such that exp^^^ tt;) = 
(P2, 42)- The bottom panel shows the evolution of norm between the shot trajectory and the 
target (p2, 42) during the shooting algorithm. 


3.4 Geodesic Distance on C 

Using the natural Riemannian metric on C (defined in Eqn. |^, the geodesic distance between 
the two points is defined as the following. 

Definition 2 Given two trajectories ai, 02 und their representations (pi, Qi), (p2, 42) G C, and 
let (x(s), v(s)) G C, s G [ 0 , 1 ] be the geodesic between (pi, qi) and (p2, 42) on C, the geodesic 
distance is given as: 


4((pi,gi), (P2,g2)) 





( 4 ) 
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Original trajectories: (pi,qi) 


(P 2 . <72) 




Baseline path: x(s) Shot trajectory: exp(p^^j)((M,w)) 




Figure 3 : Example of calculating geodesic using shooting method for trajectories on V. 


This distance has two components: ( 1 ) the length between the starting points on M, — 
|i:(s)|ds; and ( 2 ) the standard norm on Cp^ between the TSRVFs of the two trajectories, 

where g| represents the parallel transport of qi G Cp^ along x to Cp^. Since we have a numerical 
approach for approximating the geodesic, this same algorithm can also provide an estimate for 
the geodesic distance. 

3.5 Geodesic Distance on Quotient Space C/F 

The main motivation of using TSRVF representation for trajectories on M and constructing the 
distance dc to compare two trajectories comes from the following. If a trajectory a is warped 
by 7, resulting in a o 7, what is the TSRVF of the time-warped trajectory? The new TSRVF is 
given by: 


(f\^( (^( 7 (^)) 7 (^)) ^ ^ 

=Qa{^{t))VW) = (la * 7 )(^) • 

Theorem 2 For any two trajectories ai, 02 G T and their representations {pi,qi), {p2, ^2) G 
C, the metric dc satisfies dc{{pi, gaio^), (P2, q<x20'r)) = 4 ((pi, Qi), (P2, q2)), for any 7 G F 

Proof: First, if a trajectory is warped by 7 G F, the resulting trajectory is a o 7, and the 
starting point does not change. If the original representation of a is (p, q) G C, the resulting 
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representation is (p, qao'y)- Second, the baseline x(s) connecting the starting points of given two 
trajectories will not change with respect to arbitrary 7’s since the starting points are fixed, so 
for Starting with the left side, we simplify dl{{pi,qa^o'y), (P2, qa20'y)) follows: 

+ fo * 7) W - (ga2 * 

=^x + fo ldi( 7 ('^))\/W) - da 2 ( 7 ('^))VWJl^dt 
=ll + /o| k«i(7W) - qcx2{7{tW7{t) dt 
=^l + fo l^i(s) - 52(s)P ds = d^((pi,qi), (P 2 ,q 2 )) , 
where we have used s = 7(f). 

Theorem|^reveals the advantage of using TSRVF representation: the action of F on C under 
the metric dc is by isometries. The isometry property of time-warping action under the metric dc 
allows us to compare trajectories in a manner that the comparison is invariant to the time warp¬ 
ing. This is achieved through defining a distance in the quotient space of reparameterization 
group. 

To form the quotient space of C modulo the re-parameterization group, we first introduce F 
as the set of all non-decreasing, absolutely continuous functions 7 on [0,1] such that 7(0) = 0 
and 7(1) = 1 . This set is a semigroup with the composition operation (it does not have a well- 
defined inverse). It can be shown that F is a dense subset of F. Consequently, the orbit of a 
TSRVF q under the action of F is exactly the same as the closure of the orbit of q under the 
action of F. Since the orbits under F form closed sets m, while those under F do not, we 
choose to work with the former, at least for the formal development. But in practice, we will 
approximate the theoretical solutions using the elements of F. We define the quotient space 
C/F as the set of all orbits under the action of F, with each orbit being: 

[(p,g)] = (p, [g]) = {(p, (go7)y^)|7 g f} . 

To understand the orbit, one can view it as an equivalence class. For any two trajectories ai, 02 
and their representations in C, {pi,qi), (P2,?2)> we define them to be equivalent when: (1) 
Pi = P2; and ( 2 ) there exists a sequence 7* G F such that ^02071 converges to qi. In other words, 
if two trajectories have the same starting point, and the TSRVF of one can be time-warped 
into the TSRVF of the other, using a sequence of time-warpings, then these two trajectories are 
deemed equivalent to each other. Theorem [^indicates that if two trajectories are warped by the 
same 7 function, the distance dc between them remains the same. In other word, the orbits in C 
are “parallel” to each other. 

Our goal is to define a distance such that it is invariant to the time-warping of trajectories. 
This can be achieved by comparing trajectories through their equivalence classes (or the orbits). 
That is, define a metric on the quotient space C/F using the inherent Riemannian metric from 
C. The geodesic distance on C/F is defined as follows. 

Definition 3 The geodesic distance dq on C/F is the shortest distance between two orbits in C, 
given as 

dq{{piAqi])Av 2 Aq 2 ])) 

= inf _ dc{{pi, {qi o 7 i)v^))> (P2, (^2 o 72 )\/^)) 

7i?72£r 

^inf 4 ((pi,gi),(p 2 ,(g 2 0 7 )\/i')) • (6) 
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The geodesic dg between {pi, [gi]) and (p2, [52]) is obtained by forming geodesics between 
all possible cross pairs in sets (pi, [gi]) and (p2, [<12])■ Since the group action is by isometries, 
we can fixed one point, say (pi, gi), and search over all (p2, [g2]) that minimizes Eqn. 

Next, we focus on the problem of pairwise temporal registration between two trajectories. 
Eqn. I^not only defines a metric on the quotient space C/f but also provides an objective 
function for registering two trajectories: the optimal 7* for Eqn. [^aligns «2 to «i. That is the 
point ai{t) is optimally matched with the point 0:2(7*(f))- To solve Eqn. it is equivalent to 
optimize the following equation: 


min 


ll+ kUt) - iq2Mt))rdt 


( 7 ) 


where {x, v) is the geodesic between (pi, gi) and (p2, ga207), and g^ ^ means parallel transport 
gi along x to Cp^. Note that the time-warping 7 acting on «2 changes the underlying geodesic 
(x, v) between two trajectories. Algorithm|^describes a numerical solution for optimizing Eqn. 
I^on a general manifold M. 


Algorithm 4 Pairwise registration of two trajectories on M 

Represent two trajectories ai{t),a2{t) by their TSRVFs, (pi, gi) and (p2, g2). Let 7 * = lid, 
and set itermax = K (a large integer), iter = 1 and a small e > 0 . 


1 . Select one point, say (pi,gi), as the starting point and the other, {P2A2), the target 

point, where g2 denotes * 7), for 7 G E. M this step, let 7 = 7*^. 

2 . Obtain {u,w) G T{p^^q^){€) such that exp^p^ g^-^{s(u,w)) = {x{s),v{s)), s G [ 0 , 1 ] and 
(x(l),u(l)) = (p2,g2). 

3 . Parallel transport g2 to the tangent space Tp^ (M) along x{s), denoted as g|. Align g| to 
gi using Dynamic Programming Algorithm and obtain the optimal warping function 7. 

4 . Update j* = j* o j by composition. If \\l — lid\\ < ^ or iter > itermax stop. Else, set 

42 = {Qa2 * l), = iter + 1 and go back to step 3 . 

Note that Step 2 corresponds to the first argument [x, v) and Step 3 corresponds to the sec¬ 
ond argument 7 in Eqn. | 7 | respectively. The optimization over the warping function in Step 
3 is achieved using the Dynamic Programming Algorithm ll 2 ^ . Here one samples the interval 
[ 0 , 1 ] using N discrete points and then restricts to only piecewise linear 7’s that pass through 
that N X N grid. In Fig. we present one example of aligning two trajectories ai and 0:2 on 
space of 3 X 3 SPDMs. 


Fast Approximation: Since pairwise temporal registration algorithm involves multiple evalu¬ 
ations of the exponential map and dynamic programming alignment, it is not computationally 
efficient. One way to speed up this optimization is to use an approximate method: approximate 
the baseline x(s) connecting two trajectories first (using geodesic between the starting points of 
trajectories) and then align their TSRVFs. Another way is to find an explicit expression for the 
exponential map. This seems possible only for a simple manifold, such as M = S^, but for a 
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Figure 4: Pairwise registration of two trajectories. 


complicated manifold, such as M = V, the analytical expressions are not known. In the exper¬ 
iment section, we use the approximate method to speed up the registration and comparison. 

If we compare the proposed framework with that in ifT^ . we see several advantages. The 
proposed work perseveres the invariance properties achieved in ifT^ . but does not require choos¬ 
ing a reference point. Also, the proposed framework naturally includes the difference between 
the starting points of two trajectories that are ignored in ifT^ . Since the velocity vectors here are 
transported to the starting point of a trajectory, along that trajectory, as opposed to a transport 
to an arbitrary reference point in [IThl . this representation is more stable. 

3.6 Statistical Summarization of Multiple Trajectories 

Since dq defines a metric in the quotient space C/F, this framework allows us to perform sta¬ 
tistical analysis of multiple trajectories in C/F. Given a set of trajectories {a*, i = 1.. .k}, 
we are interested in computing the average of these trajectories and using it as a template for 
registering these trajectories. This sample average is calculated using the notion of Karcher 
mean Il25]l . In the space of C/F, the Karcher mean is defined to be: 

n 

il^p, [n]) = argmin [q]), (pi, [qai])f • 

(p,[9])ec/r 

Note that {pp, [pq]) is an orbit (equivalence class of trajectories) and one can select any element 
of this orbit as a template to help to align multiple trajectories. 
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Simulated trajectories: 



Mean before registration Mean after registration 


Figure 5: Example of calculating the mean trajectory. The upper panel shows simulated trajec¬ 
tories, and the bottom panel shows mean before and after alignment. 


Algorithm 5 Karcher mean 

For each aj, compute its TSRVF Qi, denote as {pi,qi). Let j = 0 be the initial 

estimate of the Karcher mean (e.g. we can choose one of the trajectories). Set small e, ei, 62 > 0. 

1. For i = 1 to n, align each trajectory {pi^qf) to according to Algorithm^ de¬ 

noted as {pi,qi). Algorithm^ also gives us the inverse exponential map: {ui,Wi) = 

2. Compute the average direction: u = ^ '^1=1 ~ k Yli=i 

3. If\\u\\ < Cl and ||tt;|| < 62, stop. Else, update (//^,//^) in the direction of{u,w) using 

exponential map: ,pj^^) = exp|.^i where e is the step size. We often let 

e = 0.5. 

4. Set j = j + 1, return to step 2. 

After obtaining the converged {pp, Pq), one can compute the covariant integral using Algo¬ 
rithm]^ denoted by p, which is the Karcher mean of {oi, 0:2, Fig. j^shows one result 

on calculating the mean of given simulated trajectories. The upper panel shows the simulated 
trajectories. The bottom panel shows mean trajectory in two cases: before alignment and after 
alignment. One can see that after the alignment, the structure of these trajectories are preserved. 
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4 Video-Based Activities Recognition 

Now we turn to evaluation of the framework developed so far on test datasets involving ac¬ 
tion recognition using videos. While there are several application secures involving video- 
based action recognition, we try two applications here: (1) hand-gesture recognition, where one 
classifies the hand motion into pre-determined classes using video data, and (2) visual speech 
recognition, where one classifies phrases uttered using videos of lip movements. 

4.1 Representations of Videos as Trajectories on V 

The video data is typically extremely high-dimensional, a short video with 50-frames could 
have more than a million pixel values. To represent a video or a frame in this video, researchers 
extract few relevant features from the data, forming low-dimensional representations and devise 
statistical analysis of these features directly. One common technique is to extract a covariance 
matrix of relevant features to represent each frame in the video JH |23 [8]|. The definition of a 
covariance feature for a frame is as follows: At each pixel (or patch) of the frame image, extract 
an d-dimension local feature (e.g. pixel location, intensity, spatial derivatives, HOG features, 
etc), and then calculate a covariance matrix of those local features over the whole frame, i.e. 
sum over the spatial coordinates. More precisely , let fx G denote a d-dimensional feature 
at location x in the image I. The empirical estimate of the covariance matrix of S is given by: 
-P := 17| ~ ~ /)^’ where / = jTj Exe/ fx is the empirical mean feature vector. 

The covariance matrix provides a natural way to fuse multiple local features, across the whole 
image. The dimension of the representation space, {dP + d) /2, is much smaller than the image 
size |/|. 

A video of human activities is now replaced by a sequence of covariance matrices. There¬ 
fore, each video is represented by a parameterized trajectory on the space of V, and we can 
utilize the previous framework to analyze these trajectories. Sometime, as in the hand-gesture 
recognition, we can localize features better by dividing the image frame into four quadrants 
and computing a covariance matrix for each quadrant. Then, each frame is represented as an 
element of and the whole video as t a{t) e 

4.2 Hand Gesture Recognition 

Hand gesture recognition using videos is an important research area since gesture is a natural 
way of communication and expressing intentions. People use gestures to depict sign language 
for deaf, convey messages in loud environment and to interface with computers. Therefore, an 
accurate and automated gesture recognition system would broadly enhance human-computer in¬ 
teraction and enrich our daily lives. In this section, we are interested in applying our framework 
in video-base (dynamic) hand gesture recognition. We use the Cambridge hand-gesture dataset 
iTTlI which has 900 video sequences with nine different hand gestures: 100 video sequences for 
each gesture. The nine gestures result from 3 primitive hand shapes and 3 primitive motions, 
and as collected under different illumination conditions. Some example gestures are shown in 
Fig. 1^ The gestures are imaged under five different illuminations, labeled as Setl, Set2, ..., 
Sets. 
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(a) Examples of gestures 




(b) Different illumination conditions 


Figure 6: (a) shows three examples of gestures in the Cambridge hand-gesture database, (b) 
shows the five different illumination conditions in the database. 


In addition to the illumination variability, the main challenge here comes from the fact that 
hands in this database are not well aligned, e.g. the proportion of a hand in an image and 
the location of the hand are different in different video sequences. To reduce these effects we 
evenly split one image into four quadrants (upper-left, upper-right, bottom-left, bottom-right) 
with some overlaps. Each of the four quadrants is represented by a sequence of covariance 
matrices (on the SPDMs manifold V). In this experiment, we use HOG features ll27l to form a 
covariance matrix per image quadrant as follows. We use 2x2 blocks of 8 x 8 pixel cells with 7 
histogram channels to form HOG features. Those HOG features are then used to generate 7x7 
covariance matrix for each quadrant of each frame. Thus, our representation of a video is now 
given hy t a{t) G 

Since we have split each hand gesture into four dynamic parts, the total distance between 
any two hand gestures is a composite of four corresponding distances. For each corresponding 
dynamic part, e.g. the upper-left part, we first the parts across videos (using Algorithm 
individually and then compare them using the metric dq, denoted by d^pi- The final distance is 
obtained using an weighted average of the four parts: d = Xidupi + \ 2 dupr + X^d^owni + X^doownr 
and Ylt=i Xi — 1. For each illumination set, we use different weights, and these weights are 
trained using randomly selected half of the data (90 video sequences) in that set, and the other 
half of the data are used for the testing. Table shows our results using the nearest neighbor 
classifier on all five sets. One can see that after the alignment, the recognition rate has significant 
improvement on every set. Also, we have reported the state-of-art results on this database 
Il28l|^. One can see that our method outperforms these methods. 


22 




Table 1: Recognition results on the Cambridge Hand-Gesture dataset 


Method 

Setl 

Set2 

Set3 

Set4 

Set5 

TCCA 0 

81% 

81% 

78% 

86% 

- 

RLPPQ 

86% 

86% 

85% 

88 % 

- 

PM 1-NN (21 

89% 

86% 

89% 

87% 

- 

PMLSR (21 

93% 

89% 

91% 

94% 

- 

Oiir 

before alignment 

94% 

91% 

90% 

88% 

77% 

will 

after alignment 

99% 

97% 

97% 

96% 

98% 


Improvement 

5% 

6% 

7% 

8% 

21% 


4.3 Visual Speech Recognition 

This application is concerned with visual speech recognition (VSR), or lip-reading, using close- 
up videos of human facial movements. Speech recognition is important because it allows com¬ 
puters to interpret human speech and take appropriate actions. It also has applications in bio¬ 
metric security, human-machine interaction, manufacturing and so on. Speech recognition is 
performed through multiple modalities - the common speech data consists of both audio and 
visual components. In the case the audio information is either not available or it is corrupted by 
noise, it becomes important to understand the speech using the visual data only. This motivates 
the need for VSR. 

The process of visual speech recognition is to understand the words uttered by speakers, 
derived from the visual cues. Movements of the tongue, lips, jaw and other speech related 
articulators are involved to make sound. To represent a video of such dynamic process, we 
extract a covariance matrix for each frame. Now each video becomes a parameterized trajectory 
on the space of V and we can utilize the proposed framework to analysis these trajectories. 
In the experiments reported here, we utilize the commonly used OuluVS dataset ll^ which 
includes 20 speakers, each uttering 10 everyday greetings five times: Hello, Excuse me, I am 
sorry. Thank you. Good bye. See you, Nice to meet you. You are welcome. How are you. Have a 
good time. Thus, totally the database has 1000 videos; all the image sequences are segmented, 
having the mouth region determined by manually labeled eye positions in each frame ll^ . 
Some examples of the segmented mouth images are shown in Fig. 

For each video frame /, we extract a 7 x 7 covariance matrix representing seven features: 
{x,y,I{x,y), |£|, li^l, If^l, The resulting trajectories in V are aligned using Algo- 

rithm|^and compared using distance dq defined in Eqn. In Fig. [^(a), we show some optimal 
7’s obtained to align one video of phrase (“excuse me”) to other videos of the same phrase 
spoken by the same person. One can see that there exist temporal differences in the original 
videos and they need to be aligned before further analysis. In (b), we show the histogram of 
{dc — dqYs (differences between distances before and after alignment). In this case, each person 
has 50 videos, and we can calculate (50 x 49)/2 pairwise distances before and after alignment, 
and their differences. For all 20 persons in this dataset, we have 20 x (50 x 49)/2 = 24500 
such differences. From the histogram of these differences, one can see that after the alignment, 
the distances (dq’s) consistently become smaller. Note that one can choose other features to 
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Figure 7: Examples of down sampled video sequences in OuluVS dataset. The first and second 
row show one person’s two speech samples of the phrase “Nice to meet you’’; the third and 
fourth row show the phrases “How are you’’ and “Good bye’’ uttered by different persons. 
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(b) Histogram of (dc — c?g)’s 



Figure 8: (a) shows the optimal 7’s obtained to align one video of phrase (“excuse me’’) to the 
other four videos of the same phrase spoken by same person, (b) shows we show the histogram 
of {dc — dqYs (differences between distances before and after alignment). 


obtain a better representation perhaps, but the main point here is the improvement in temporal 
alignment and reduced distances between trajectories. 

To compare classification performance with previous methods |l30l|4]|, we perform the ex¬ 
periment on a subset of the whole dataset, which contains 800 video sequences by removing 
some short videos due to the restriction of the method in |l30|. (Although our method do not 
have this restriction, to compare fairly, we perform the experiment in the same subset). Then, 
we perform the Speaker-Dependent Test (SDT, see ll^ for details) on this subset. The recog¬ 
nition rate is calculated based on Nearest Neighbor (NN) classifier. Table shows the average 
first nearest neighbor (INN) classification rate of our method and previous methods. One can 
see that even with a simple classifier, our method has the classification rate of 78.6%, which is 
8.1% better than ||4]|’s. This indicates the advantages of proposed intrinsic method compared 
with the one in One can also see that, there are 37.6% percent of improvement after the 
alignment (registration), which demonstrates the importance of removing temporal difference 
in comparing of the dynamic systems in computer vision. Several other papers have reported 
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Table 2: Comparison of SDT performance on OuluVS database 


Method 

INN Rate 

Zhao et al. ll^ 

70.2% 

Su et al. im 

before alignment 
after alignment 

33.8% 

70.5% 

Our method 

before alignment 

after alignment 

41.0% 

78.6% 


higher classification rates on this dataset by they generally use advanced classier (e.g. SVM, 
leave-one-out cross valuation), additional information (e.g. audio, image depth) and machine 
learning techniques |l32l[33l[34]]. Thus, their results are not directly comparable with our results. 


5 Conclusion 

In summary, we have proposed metric-based approach for simultaneous alignment and compar¬ 
isons of trajectories on V, the Riemannian manifold of covariance matrices (SPDMs). In order 
to facilitate our analysis, we impose a Riemannian structure on this manifold, induced by the 
action of SO{n) and SL{n), resulting in explicit expressions for geometric quantities such as 
parallel transport and Riemannian curvature tensor. Returning to the trajectories, the basic idea 
is to represent each trajectory by a starting point P E V and a TSRVF which is a curve in the 
tangent space TpiV). The metric for comparing these elements is a composed of: (a) the length 
of the path between the starting points and (b) the distortion introduced in parallel translation 
TSRVFs along that path. The search for optimal path, or a geodesic, is based on a shooting 
method, that in itself uses geodesic equations for computing the exponential map. Using a 
numerical implementation of the exponential map, we derive numerical solutions for pairwise 
alignment of covariance trajectories and to quantify their differences using a rate-invariant dis¬ 
tance. We have applied this framework to covariance tracking in video data, with applications 
to hand-gesture recognition and visual-speech recognition, and obtain state of the art results in 
each case. 
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A Riemannian Structure on SL{n) and Its Quotient Space 

We start with SL{n), the set of all n x n with unit determinant. For the identity matrix 
/ e SL{n), the tangent space at identity is given by sl{n) = Ti{SL{n)) = { 74 |tr( 74 ) = 0 }, the 
space of all n X n matrices with trace zeros. The tangent space at any other point G G SL{n), 
TaiSLin)), is given by {GA\A G 5 ((n)}. Each left-invariant vector field on SL{n) is deter¬ 
mined by its value at /: A{G) = GA{I), where G G SL{n) and A{I) G si{n). Note that any 
group naturally acts on itself and we will use this fact for SL{n). 

The lie algebra 5 l(n) can be expressed as the direct sum of two vector subspaces: sl(n) = 
5 o(n) 0 p(n), where so{n) = (^ G s[(n)|A* = —A}, and p(n) = (^ G s((n)|A* = A}. 
The tangent space at any arbitrary point G G SL{n) also has a similar decomposition: for any 
B G TG{SL{n)), we have B = GBs 0 GBp such that Bg G so{n) and Bp G p(n). 

Riemannian metric on SL{n)\ Now we define the Riemannian metric on SL{n) that will 
later be used for inducing a Riemannian structure on V. For B,G E sl{n), we define a metric 
as {B,G)p = ix{BG^). At any other point G G SL{n), the metric is calculated by pulling 
back the tangent vectors B', G' G TaiSLin)) to 5 ((n), by multiplying G~^ on the left, i.e. 
G~^B', G~^G'. Thus, the inner product (or the Riemannian metric) is given by: 

{B',G')^ = ir{{G-^B'){G-^Gj) , (8) 

where G G SL{n), B\ G' G TG{SL{n)). This pullback operation ensures that this metric is 
invariant to the left action of SL{n), i.e. (5, G)^ = {GB, GG)q. In fact, the difference in this 
framework to the Riemannian metric used in |I3 comes from the mapping used for pullback. 
The mapping used in that paper is not left invariant. 

Geodesic paths on SL{n): An important consequence of the S'L(n)-invariance (Eqn. is that 
we can simplify some calculations by transforming our problems appropriately. Eor instance, if 
we need to compute a geodesic between two arbitrary elements Gi, 6*2 G SL{n), then we can 
first solve the problem of computing the geodesic, between / and G 12 , where G 12 = G)"^G 2 , and 
then multiply on the left by Gi. The geodesic between I and G 12 is given by 1 1 -> e^i 2 e(^i 2 “^i 2 ), 
according to ll^ . where A 12 = argmin^gg,^^) — Gi 2 ||f. and || • \\f indicates the 

Frobenius norm. In fact, if G 12 is symmetric and positive definite, then A 12 = logm(Gi 2 ) G 
p(n) and the geodesic has the simple expression 1 so that the desired geodesic between 
Gi and G 2 is f i-?- The geodesic distance between the two points is given by: 

C^S'L(n)(Gl, G 2 ) = dsL(n){IiG 12) = ||^ 12 || • (9) 

Riemannian curvature tensor on SL{n)\ Let X, Y and Z be three tangent vectors at a point 
G G SL{n), and we want to compute the Riemannian curvature tensor R{X, Y){Z). While the 
general form for R{X, Y){Z) is complicated ll35ll . if X = GA, Y = GB and Z = GG where 
A, B and G are elements of p(n), then the tensor is given by: 

R(X,Y)(Z) = ~[[X,Y],Z] = -\g[\A,B],C] , 

where [A, B] = AB — BA. (This use of square brackets is called Lie bracket and should be 
distinguished from the use of square brackets to denote equivalence classes.) 
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Now we consider a quotient space of SL{n) and, using the theory of Riemannian submer¬ 
sion, inherit some of these formulas to this quotient space. Define the right action of SO{n) on 
SL{n) according to: 

SL{n) X SO{n) SL{n), given by {G * S) = GS . 

An orbit under this action is given by [G] = {G5'|5' e 5'0(n)}. The set of these orbits forms 
the quotient space SL{n)/SO{n) = {[GjlG e SL{n)}. Since SO{n) is a closed subgroup of 
SL{n), the quotient space is a manifold. 

Using the Riemannian metric on SL{n), it is easy to specify the tangent bundle of the 
quotient space. The tangent space T[i]{SL{n)/SO{n)) is simply the subspace of Ti{SL{n)) 
which is orthogonal Ti{SO{n)), the space tangent to the orbit at I. It is well known that 
Ti{SO{n)) = so{n). The subspace of sl{n) which is orthogonal to 5 o(n) is p(n). So, we 
have T[i]{SL{n)/SO{n)) = p(n). For any arbitrary G G SL{n), the vector space tangent 
to the quotient space at [G], T[a\{SL{n)/SO{n)) can be identified with the set {GB\B e 
p(n), for any G e [G]}. 

Riemannian metric on SL{n)/SO{n): Now that we have the tangent bundle of the quotient 
space, we can define a Riemannian metric on this quotient space by inducing it from the larger 
space SL{n). This is possible since the action of SO{n) is by isometries under that metric (and 
the fact that SO{n) is a closed set). By isometry we mean that {AS, BS)^^ = {A, B)q, for 
any S G SO{n) and A,Be TG{SL{n)). Therefore, we can induce this metric from SL{n) 
to the quotient space SL{n)/SO{n). For any two vectors A,Be T[g]{SL{ n)/SO{n)), with 

the above identification, we define the metric as: (^A,B^ = ts:{{G~^A)(G~^By), for any 

(5 G [G]. Note that G~^A, G~^B G p(n). 

Geodesic paths on SL{n)/SO{n): The geodesics in the quotient space SL{n)/SO{n) can be 
expressed using those geodesics in the larger space, SL{n), that are perpendicular to every orbit 
they meet. Therefore, a geodesic between the points [Gi] and [G 2 ] in SL{n)/SO{n) is given 
by t !-)■ [Gie^^^y, where A 12 G p(n) such that ^ [G)“^G 2 ]. The last part means that there 
exists an 5'i2 G SO{n) such that = G'^^G 2 Si 2 . The geodesic distance between the two 
orbits is given by: 

C?SL(n)/SO(n)([Gi], [G2]) = dsL{n)/SO{n){[I], [G'12]) = ||^ 12 || • ( 10 ) 

Parallel transport of tangent vectors along geodesics on SL{n) / SO{n ): Let X G 7[g] {SL{n)/SO{n)) 

be a tangent vector that needs to be translated along a geodesic path given by t I-7- [Ge*"^], where 
A G p(n). Let i? G p(n) such that X is identified with GB, where G G [G]. Then, the parallel 
transport of X along the geodesic is identified with the vector field 1Ge*^B along Ge*^ on 
SL{n). 

Riemannian curvature tensor on SL{n)/SO{n): Let X, Y and Z be three tangent vectors 
at a point [G] G SL{n)/SO{n), and we want to compute the Riemannian curvature tensor 
R{X, V)(Z) on the quotient space. Let A, B and G be elements of p(n) such that X = GA, 

Y = GB and Z = GG, where one can use any G G [G] for this purpose. Then, the tensor is 
given by: 

R{X,Y){Z) = -[[X,y],Z] = -G[[A,BIG] , 
where [A, B] = AB — BA as earlier. 
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